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© A method for detecting and avoiding erroneous decombining in combining networks for parallel 
computing systems. 



© A method to detect and avoid erroneous mes- 
sage decombining in the switching nodes of mes- 
sage combining multistage interconnection networks 
used to connect the multiple processors to the mul- 
tiple memory modules of the shared memories used 
in parallel computing systems. The method assigns 
a unique ID to each switch and tags each combined - 
message with it. This ID is also tagged to the reply 
messages. During decombining the message's ID is 
matched with the switch's ID, if they do not match 
the message is not decombined. According to a 
further feature of the invention, it is possible to 
provide sufficient information in the ID field to allow 
N a routing error to be detected. 
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A METHOD FOR DETECTING AND AVOIDING ERRONEOUS DECOMBINING IN COMBINING NETWORKS 

FOR PARALLEL COMPUTING SYSTEMS 



The present invention relates to data processor 
storage systems and, more particularly, to such 
storage system controls for use in a large mul- 
tiprocessor computer system having a plurality of 
processors and a plurality of individual memory 
modules. Still more particularly, it relates to the 
architecture of the Interconnection Network which 
implements and controls data flow between individ- 
ual processors and memory modules in such sys- 
tems. 

Recent studies (1 and 4) have shown that the 
memory systems of large parallel computers can 
potentially suffer from a performance-crippling de- 
fect. If a significant fraction of memory references 
are addressed to one memory address, the perfor- 
mance of the whole system can be limited by the 
capacity of that one memory. If access to that 
memory is via a Multistage Interconnection Net- 
work (MIN) (see FIG. 2), a phenomenon known as 
"tree blockage " will result in additional contention, 
causing substantial delays to all users of that sys- 
tem. 

One solution to this problem is a Combining 
MIN (4, 6 and 8). The combining network causes 
references to the memory "hot spot" to be com- 
bined enroute to memory, reducing the contention 
for the hot spot. A graphical example of combining 
and decombining is shown in FIG. 6. 

This combining can take place at any switch in 
any stage of the network. But in order to decom- 
bine correctly, the reply from the memory must 
decombine at the switch that generated the com- 
bined message. An error should be reported if the 
message attempts to decombine at any other 
switch. Such erroneous decombines must be 
avoided for proper system performance. As will be 
well understood, a combined packet can combine 
again. In fact one can design the combining net- 
work switch to combine more than two packets for 
the same address. 

To avoid performance degradation due to hot 
spots, a combining network has been proposed 
and designed for the RP3 experimental computer 
(1). In the RP3 combining network design, each 
"combined" message is tagged with the number of 
the stage at which it combined. This tagging is 
done by the switch that does the combining. The 
resulting reply message from the memory is also 
tagged with this stage number (i.e. the stage num- 
ber in the request message). This is done by the 
memory. In order to decombine a message at the 
correct stage, each switch first compares it's -stage 
number. to this tag. If a match takes place, then the 
switch decombines the message, otherwise it just 



routes the message to . the next stage. It is as- 
sumed that the routing method used by the net- 
work guarantees (if the network is fault free) that 
the messages to and from the memory are routed 

5 through the same switches of the network. The 
routing scheme used by RP3 guarantees this. 

For example, in Fig. 6 two messages get com- 
bined at switch 609 in stage2. Therefore, the switch 
609 tags the resulting combined message with 

ro "TAG = 2". The memory 613 attaches this tag to 
its reply message. This reply message passes 
through switches 611, 609, 606 and 605 in stages 
3, 2 and 1 respectively of the network. But only the 
switch 609 in stage-2 decombines the message 

75 because its stage number matches the reply mes- 
sages's tag (i.e. TAG = 2), Therefore the decom- 
bining should be done by the switch that did the^ 
combining. 

This scheme works well if it is assumed that no 

20 faults exist in the combining network. But this 
method does not work if it is possible to have faults 
that either: (i) corrupt the tag field of a combined 
or reply message before it is decombined; and/or 
(2) incorrectly route a message, at any stage of the 

25 network, before it is decombined. In the first case 
an erroneous decombine takes place because the 
message decombines at the stage that matches 
the corrupted tag. This stage is not the stage at 
which the message combined. In the second case 

30 an erroneous decombine takes place because the 
message is decombined at a switch that did not 
combine the message, although the switch is in the 
correct stage. For example, in FIG. 6 if due to an 
error switch 611 routes the reply message to 

35 switch 615, then switch 615 can erroneously de- 
combine this message. The resulting messages will 
then be routed incorrectly to switches 618 and 619 
and onwards to PEs attached to them 622 and 623. 
Errors due to corrupted tags can be detected if 

40 error detection codes and circuitry are used. For 
example, in a combining network if a byte of a 
message can have only single (or odd) bit errors, 
then parity can be used for each byte (e.g. RP3 
Combining Network). To detect errors the switches 

45 of this network will need to have the necessary 
parity generating and checking logic. The only cau- 
tion here is that as the message is routed through 
the network, the parity generation circuitry of the 
switches should not mask errors. That is if a byte is 

so received in error, then the switch should transmit 
this byte with the parity set to indicate an error. If 
this is not done, then masking of errors can result 
in erroneous decombining. 

Errors due to incorrect routing can occur when 
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the routing tag is in error and/or when the switch's 
control circuitry is faulty. Routing errors due to 
routing tag errors can also be detected by error 
detecting codes (if used). But incorrect routing er- 
rors due to control circuitry errors cannot be de- 
tected by error detecting codes. This is because 
these codes do not detect errors in the control 
circuitry of the switch. They can detect errors only 
in the data. Therefore, if a switch's routing control 
circuitry has a fault (transient or permanent), a 
message can be incorrectly routed without any of 
its data words being corrupted, in such cases the 
error will not be detected by the network and the 
message will decombine erroneously. 

in the above discussion and in the following 
description of the present invention it is assumed 
that the error detecting code can detect all possible 
routing tag errors. If it cannot do so, then incorrect 
routing can take place sometimes. For example, in 
the RP3 Combining Network one bit of odd parity 
is assigned to a byte of a message. Therefore all 
single (or odd) bit errors can be detected in the 
message. But double (or even) bit errors cannot be 
detected. If the routing tag has a double (or even) 
bit error, then the message can be. routed in- 
correctly and this error will not be detected by the 
switch. 

Therefore, it is apparent that the RP3 combin- 
ing network switch design cannot always detect 
and avoid erroneous decombining due to incorrect 
routing of messages. It is important to avoid such 
errors because they can result in incorrect comput- 
ing. Further, it is important to detect and avoid 
such errors because both the user applications and 
the operating . system are expected to use the 
Combining Network (e.g. RP3). Erroneous decom- 
bines can have crippling effects on the operating 
system and multi-user parallel processor systems. 

European Patent Application 89 105 326.6 en- 
titled "A Hardware Mechanism For Automatically 
Detecting Hot-Spot References And Diverting 
Same From Memory Traffic In A Multiprocessor 
Computer System" discloses an alternative way to 
help reduce memory blockages. Instead of combin- 
ing and decombining messages in a single mul- 
tistage interconnection network as with the present 
invention, the above application utilized two sepa- 
rate networks, one for low latency and one other 
capable of handling high contention traffic, and 
diverts those messages which are detected "hot- 
spots" over the second network. The system also 
provides means for detecting and removing hot- 
spots on a dynamic basis to efficiently control the 
message routing function. 

The following four articles generally describe 
the attributes of the previously referenced experi- 
mental high-speed multiprocessor computing sys- 
tem known as the RP3 having a large shared 



memory. All four of these articles appear in the 
Proceedings of 1985 International Conference on 
Parallel Processing. August 20-23, 1985. 

1. Pfister, G.F.; Brantley, W.C.; George, • 
s D.A.,; Harvey, S.L.; Kleinfelder, W.J.; McAuliffe, 

K.P.; Melton, E.A.; Norton, V.A.; and Weiss, J. "The 
IBM Research Parallel Processor Prototype (RP3): 
Introduction and Architecture," pp. 764-771. This 
article is tutorial in nature and describes an overall 
io multiprocessor system in which the present inven- 
tion has particular utility. 

2. Norton, V,A. and Pfister, G.F. "A Method- 
ology for Predicting Multiprocessor Performance," 
pp. 772-781. This article is also tutorial in -nature 

rs and describes methods which attempt to predict a 
given multiprocessor performance and indicates 
some* of the considerations used in predicting var- 
- ious types of memory blockages, etc., which can 
occur to seriously detract from the overall system 

20 performance. It is noted that these general con- 
cepts for evaluation and monitoring were instru- 
mental in recognizing the need for the present 
invention. 

3. McAuliffe, K.P.; Brantley, W.C.; and 
25 Weiss, J. "The RP3 Processor/Memory Element," 

pp. 782-789. This article describes a memory ele- 
ment for such a system and broadly describes 
some of the memory design considerations which 
affect overall memory performance, and is relevant 
30 to the. present invention in that it is background 
information for the memory design of a large mul- 
tiprocessor system. 

4. Pfister, G.F. and Norton, V.A. "Hot-Spot 
Contention and Combining in Multistage Intercon- 

35 nection Networks," pp. 790-797. This article gen- 
erally discusses the problems and interconnection 
network for a large multiprocessor system such as 
the RP3 and suggests the use of two separate 
interconnection networks over which memory re- 

40 ferences may be selectively sent to at least alle- 
viate the Hot-Spot problem. 

The following four papers/notes comprise pub- 
lic domain articles discussing combining networks 
and combining switch designs. Their relevancy is 

45 discussed generally below and specifically in the 
body of the specification. 

5. Gottleib, A., et al, "The NYU Ultracom- 
puter -Designing a MIMD, Shared Memory Parallel 
Machine", IEEE TC, February 1983, pages 175- 

50 1 89. 

This paper is a high level description of the NYU 
Ultra-Computer. It also generally discusses the use 
of combining messages in such a multiprocessor 
system. 

55 6. Susan Dickey, Richard Kenner, Marc Snir 

and John Sol worth, "A VLSI Combining Network for 
the NYU Ultracomputer", Proceedings of the Inter- 
national Conference on Computer Design, 1985. 
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This paper describes the routing scheme and link 
protocols for a combining network (i.e. FIG. 5). It 
also describes a VLSI implementation of a combin- 
ing switch. This VLSI implementation description 
also indicates the type of information that needs to 
be maintained with the switch and also information 
that the forward and reverse path switches ex- 
change. 

7. Susan Dickey and Richard Kenner, 
"Specification for a Combining-S witch", New York 
University, Note of November 18, 1986. Published 
Note available from NYU Ultra-Computer Project. 
This paper specifies message formats for a design 
of a combining switch, the combinable operations 
supported and the use* of the switch level in the 
packet format to indicate the decombining switch. 
In the present embodiment this switch level is 
referred to as the stage number. The present in- 
vention is a significant improvement over a scheme 
that uses only a stage number (switch level), to 
indicate which switch should decombine a com- 
bined message. 

8. Yarsun Hsu, Wally Kleinfelder and C. J. 
Tan, "Design of a Combining Network for the RP3 
Project". 1987 international Symposium on VLSI 
Technology, (Systems and Applications), May 13- 
15, 1987. 

This paper describes a design .for a combining 
network switch. It describes the combining switch 
structure for the forward and reverse paths, a rout- 
ing scheme used by the. network, message formats 
(i.e. FIG. 5) forward and reverse path packets, and 
various combinable operations. 

The following four patents were found pursuant 
to a USPTO prior art search and are deemed to be 
the closest references found although they are not 
overly relevant to the present invention as will be 
apparent from the following discussion. 

U.S. Patent No. 4,081,612 relates to a method 
for the building-up of a routing address, consisting 
of routing words that are associated with the 
switching nodes of a digital telecommunication net- 
work. This method suggests that the routing ad- 
dress be generated by compounding the address 
field with the address of each segment of the 
routing path. 

The significant difference between this and the 
present invention is: (1) the application domain of 
the present invention is not telecommunications, 
but parallel processing systems (computing sys- 
tems); and (2) the present invention is not directed 
to routing address generation but to a method to 
avoid erroneous decombining in combining net- 
works, which can be used in information/computing 
systems. 

U.S. Patent No. 4,569,041 discloses a network 
for building a circuit/packet switching loop network, 
that can compose and route composite packets. 



The present invention is significantly different in 
that it discloses a technique to be used in mul- 
tistage interconnection networks. Also the present 
invention does not consider routing packets in a 
5 circuit switched environment. 

U.S. Patent No. 4,577,311 relates to a packet 
based telecommunication system wherein each 
packet has, inserted in an address field, a se- 
quence of addresses assigned to the successive 
70 switching networks included in a route from a pack- 
et sender station to a packet receiver station. 

The significant difference between this and the 
present invention is: (1) the application domain is 
not telecommunications, but parallel processing 
75 systems (computing systems); and (2) the present 
invention is not directed to routing address genera- 
tion, but to a method for avoiding erroneous de- 
combining in combining networks, which can be 
used in information/computing systems. 
20 U.S. Patent No. 4,168,400 discloses a checking 

device and an eliminating and switching device for 
eliminating packets having erroneous address and 
data fields. The patent uses an error detection 
code to detect errors in the address field and 
25 another error detection code to detect errors in the 
. data field of a packet. These error detection codes 
help detect errors that may be introduced in these 
fields while the packet is being transmitted from 
one node of the network to another. 
30 The present invention does not preclude the 

use of such error detection codes to detect errors 
in these fields, that may have been introduced 
during transmission of the packet. It should be 
noted that these codes only help detect errors that 
35 are introduced by the data path of the circuitry (i.e. 
the circuitry and lines used to transmit the bits of 
the address, data and these error code fields). 
These codes cannot detect errors that are gen- 
erated by the control circuitry of the node (e.g. 
40 incorrect routing of a packet, once a node has 
received it without error). The present invention 
discloses methods for detecting such control cir- 
cuitry errors for combining networks. Detecting 
such control circuitry errors is important for the 
45 correct operation of computing systems using com- 
bining networks and goes beyond conventional er- 
ror detection/correction. 

The following is a list of references found pur- 
suant to the aforementioned prior art search which 
so generally constitute background art. but are not 
deemed sufficiently relevant to warrant specific dis- 
cussion. 

U.S. Patent 4,413,318 (IBM) 
U.S. Patent 4,207,609 (IBM) 
55 U.S. Patent 4,354,263 
U.S. Patent 4,153,932 

It is a primary object of the present invention to 
provide an improved multistage interconnection 
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network architecture for use in a large shared 
memory multiprocessing system. 

It is a further object to provide such an ar- 
chitecture which improves network efficiency by 
reducing "blockages". 

it is another object of the invention to provide 
such an architecture which improves the reliability 
of such networks having the ability to combine and 
decombine messages. 

It is yet another object of the invention to 
provide such an interconnection network in con- 
junction with a Combining/Decombining feature 
which detects and avoids erroneous "decombines" 
of messages. 

It is a further object of the invention to provide 
such a network architecture which utilizes addi- 
tional data in the message header together with 
error detection circuitry in the network to detect 
and thus avoid erroneous message decombinations 
in the switching network. 

The objects, features and advantages of the 
present invention are accomplished by an im- 
proved method for handling message 
Combining/Decombining in a multistage intercon- 
nection network which comprises assigning a 
unique identification number to every switch in the 
network that whenever a message requires combin- 
ing at a particular switch node the unique ID of the 
combining switch node is attached to the message 
in the message header. When the memory gen- 
erates, a response to this message, the memory 
copies this ID into the response message's header. 
When the message is returned to the interconnec- 
tion network by a memory element for potential 
decombining, this ID in the header is compared 
with the ID of each receiving switch node. If the IDs 
match, then decombining is done by the switch. If 
the IDs do not match, then decombining is not 
done by that switch. This way erroneous decom- 
bining is avoided. An error is detected when a 
processor does not see a zero ID (i.e. the NOC ID) 
in any message it receives from the combining 
network. A non-zero ID at a processor indicates 
that decombining was not done. This is an error. 

According to a first aspect of the invention a 
single, unique switch ID is utilized in the system 
which will clearly indicate an erroneous attempted 
decombination and, according to a further embodi- 
ment of the invention, both a switch level ID as well 
as an in dividual switch ID are concatenated in the 
header to give a more sophisticated form of error 
detection for subsequent diagnostic use by the 
system control in the sense that it provides more 
information to the system when an error is de- 
tected. In the second embodiment a combining 
switch is able to detect the error and possibly take 
remedial action whereas in the first embodiment 
only the processor can detect the error and take 



appropriate action. 

FIG. 1 comprises a high level functional 
block diagram illustrating the typical organization of 
a large parallel processor system. 

5 FIG. 2 comprises a functional block diagram 

illustrating a typical multistage interconnection net- 
work (MIN) as would be utilized in the parallel 
processor system illustrated in FIG. 1. 

FIG. 3 comprises an example (using the 

70 network of FIG. 2) illustrating a communication path 
between a particular processor and a particular 
memory element of such a parallel processor sys- 
tem in the interconnection network. 

FIG. 4 comprises a high level functional dia- 

75 gram illustrating the overall organization and flow 
path of a combining network. 

FIG. 5 comprises a functional diagram illus- 
trating a particular switch as would be utilized in 
the combining network shown in FIG. 4. 

20 FIG. 6 comprises a diagram illustrating a 

message combining and a subsequent decombin- 
ing operation in the overall MIN of FIG. 4 and also 
illustrating an erroneous decombining operation. 

FIG. 7 comprises a diagram illustrating the 

2s overall structure and content of a network mes- 
sage. 

FIG. 8 comprises a diagram illustrating the 
structure and content of the Header field of the 
message of FIG. 7. 
30 FIG. 9 comprises a flow chart of the oper- 

ations which must be performed in the forward 
direction at each switch node in a combining net- 
work incorporating the features of the present in- 
vention. 

35 FIG. 10 comprises a flow chart of the oper- 

ations which must be performed in the reverse 
direction at each switch node in such a combining 
network when returning messages from a memory 
element to a requesting processor in accordance 

40 with the teachings of the present invention. 

FIG. 11 comprises the switching network of 
FIG. 2 without showing the specific interconnec- 
tions, illustrate a first tagging method for accom- 
plishing the objectives of the present invention. 

4 5 FIG. 12 is a figure similar to FIG. 11 illustrat- 

ing a second tagging method for accomplishing the 
objectives of the present invention. 

Two closely related methods for preventing 
erroneous decombining, in accordance with the 

so teachings of the present invention will be de- 
scribed. Some overall assumptions and a high level 
overview of the invention will first be described 
followed by a detailed description of the above two 
methods. 

55 It is assumed here that some method exists by 

which a switch in a Combining Network can inform 
the host system that an error was detected while 
decombining a message. The exact method used 
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is not relevant to this invention. One way this can 
be done is by setting an error indicting bit/field in 
the message (assuming such a field is provided). 
Then the processors can test this bit/field in every 
decombined message they receive. If it is set, the 
processor can initiate the necessary actions (such 
as retry, interrupt, etc.). It is also assumed that 
error detecting codes and appropriate error detect- 
ing hardware are used to detect errors in the mes- 
sage words. 

According to the first method, each switch in 
the Combining network is assigned a unique iden- 
tification number (ID). This ID is unique with re- 
spect to the other switches of the network. For 
example, if there are 12 switches in a Combining 
Network, then the switches can be assigned num- 
bers from 1 through 12 (see FIG. 11). The number 
O can be reserved to indicate that combining did 
not take place. It is of course required that each 
switch be informed of its ID at network initialization 
time. Whenever a switch combines messages, the 
resulting combined message is tagged with this 
switch's ID. The combined message must, of 
course, be provided with a fieid in which this ID 
can be written. When the memory services a com- 
bined message, it also tags the reply message with 
this same ID. As this reply message is routed back, 
each switch compares its ID with the ID in the 
message. A switch decombines the message only 
if these IDs are the same. If they do not match, the 
message is routed onto the- next stage, it is thus 
apparent that an erroneous decombine cannot take 
place because each switch - has a unique ID. 

It should be noted that the method presented 
* only avoids erroneous decombining from taking 
place. It does not detect an error. This is because 
in this method a switch cannot not tell if a message 
should have also been decombined at some other 
switch, before it reached it (the switch). In order to 
detect such errors the second method can be 
used. 

According to the second method the switch's 
ID is constructed by concatenating two fields 1213 
and 1214 (see FIG. 12). The first field 1213 in- 
dicates the stage in which the switch resides, while 
the second field 1214 gives the unique number of 
the switch within that stage (i.e. its row number). 
For example if there are M L" stages of switches in 
the network and M N" switches per stage. Then 
each stage can be assigned a unique number from 
1 through "L" and each switch in a stage can be 
assigned a unique number from 1 through "N". 
The stage number 0 and switch number 0 can be 
used to indicate that a message did not combine 
(NOC). The switch ID can be constructed by con- 
catenating the bit representation of these two num- 
bers. 

When the second method is used, the switches 



decombine a message only if their ID matches the 
message's ID. That is both the fields of both the 
IDs should match. If either of these fields do not 
match, then the message is not decombined and is 
5 routed to the next stage. But if the stage number 
fields match and the switch number fields do not 
match, then it is flagged as an error. This is be- 
cause this mismatch indicates that the reply mes- 
sage is at the correct stage, but at the wrong 
70 switch. Since these Combining Networks do not 
have interconnections between the switches of a 
stage, this message can never be decombined. 
Therefore, an incorrect routing error has taken 
place. Thus it may be seen that this ID numbering 
75 scheme not only avoids erroneous decombining at 
a switch, but also detects them at the switch. 

It should be noted here that the designer is 
free to choose the ID assignment to the switches, 
as long as the above guidelines are followed. Also 
20 error detection codes, as discussed earlier, can be 
used to detect errors in these ID fields. 

To facilitate an understanding of this invention, 
a parallel processor system can be seen to contain 
three distinct elements: processor element (PE) 
25 101, memory element (ME) 102 and an intercon- 
nection network 103 (see FIG. 1). A parallel proces- 
sor system consists of several processors and 
memory elements that are connected to each other 
via the interconnection network. One or more net- 
30 works can be used for this interconnection. In order 
to communicate across this network, a PE 101 
sends on cable 104 a message to the network 103. 
The network routes this message to the required 
ME. The memory element 102 receives this mes- 
35 sage over cable 105, processes it and sends a 
reply message over cable 106 to the requesting 
network 103. The network then routes this message 
to the required PE. The PE receives this message 
on cable 107 and processes it. It should be noted 
40 that a network can also be used to communicate 
between the PEs. 

The details of the PE and ME are not relevant 
to this invention and are accordingly not described 
in detail.* For a more complete description of the 
45 general characteristics of such Multi Processor sys- 
tems, reference is again made to the example (1 
and 3). 

Multistage interconnection networks (MINs) are 
very often used by parallel processing systems. A 

so generic example of a MIN is shown in FIG. 2. FIG. 
2 shows an 8x8 size MIN, it connects 8 PEs to 8 
MEs. It is readily possible to design MINs of other 
sizes. MINs use several stages of switches 203 that 
are interconnected in some specified order via links 

55 such as 204 and 205. .The PEs and MEs are also 
interfaced to the MIN such as by finks 201, '202, 
206 and 207. 

The links 201 , 202, 204, 205. 206 and 207 of 
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the MIN are busses that are used to communicate 
messages. Depending on the implementation of the 
MIN, these links can be unidirectional or bidirec- 
tional. The details of the links and the protocol 
used to communicate across them are not specifi- 
cally disclosed as their details would be well known 
to those skilled in the art of this invention. An 
example of a link protocol is given in reference (6). 

The switch 203 of the MIN is basically respon- 
sible for routing the messages it receives on its 
input links 201 and 202 to its output links 204 and 
205. For the example of FIG. 3 the path, taken by 
messages proceeding from PE4 to ME0 is shown. 
PE4 sends its messages to switch 302 via link 301 . 
The switch 302 routes this message and sends it 
across its output link 303 to switch 304. The switch 
304 routes this message across its output link 305 
to switch 306. The switch 306 then routes this 
message across link 307 to ME0. A similar commu- 
nication path is constructed to route the reply from 
ME0 to PE4. Note that these communication paths 
are constructed dynamically by the switches for 
each message, as is well known in the art. Also, 
after each message has been sent across a switch, 
the switch disconnects this path and processes the 
next message if any). A MIN can route many 
messages concurrently through it. The MIN exam- 
ple of FIG. 2 uses a 2x2 size switch (i.e. the switch 
has 2 input and 2 output links). It is also well known 
to use other size switches for MINs. The details of 
the switch are not relevant to the invention. Suit- 
able switches are disclosed and discussed in re- 
ferences (6, 7 and 8). 

The way in which the links (201 , 202 and 204 
to 207 connect the switches of the link is called the 
topology of the MIN. FIG. 2 shows one such topol- 
ogy. Other topologies can also be used. The de- 
tails of a MIN's particular topology are, similarly, 
not relevant to the present invention. 

Besides routing messages, the switches of the 
MIN can be designed to process the messages. 
One such processing function that can be designed 
is called combining. MIN networks that support 
combining are called Combining Networks. A suit- 
able combining network will now be described. 

An example of the combining function and an 
example of combining in combining networks will 
first be described. Then the organization of such a 
network will be set forth. Following this the or- 
ganization of the combining network switch is de- 
scribed. 

The main reason for combining is to reduce 
the message traffic seen by .the MEs, as described 
previously (1 and 4). This message traffic is re- 
duced by generating one .request for a group of 
similar requests to the same address in the ME. 
For example, if 2 load (LD) operations for the same 
address A are requested by the PEs of a parallel 



processor system, then using the combining func- 
tion the system sends only one LD request to the 
ME hosting address A. (Note two LD requests to 
the memory are now reduced to one LD request, 

5 thus reducing the traffic to the ME by 50% in this 
case). The ME executes this request and generates 
a reply message. This reply message is decom- 
bined by the system to generate the reply mes- 
sage for the two LD requests that were originally 

70 requested by the PEs. The resulting reply mes- 
sages are then sent to the PEs. This example used 
load memory operations to explain combining, but 
combining can be done for many other types of 
memory operations too. A number of such combin- 

75 able operations are suggested in (3, 4 and 8). 

In combining networks such combining oper- 
ations are done by the switches of the network. 
FIG. 6 shows how the combining example ex- 
plained above is executed by a combining network. 

20 An 8x8 MIN, using 2x2 switches, as shown in FIG. 
2, is used as an example of such a network. 
Combining can be supported in MIN networks of 
other sizes, implemented using switches of other 
sizes. For the combining network example de- 
■ 25 scribed here, the communication path from the PEs 
to the MEs is referred to as the Forward path, and 
the communication path from the MEs to the PEs 
as the Reverse path. In the FIG. 6 example, PEs 
601 and 602 generate the LD requests (<LDA>) 

30 and send them to switches 605 and 606 across the 
links 603 and 604, respectively, in this example, no 
combinable messages were detected by switches 

605 and 606. Therefore, these switches 605 and 

606 only route these LD request messages across 
35 links 607 and 608 respectively, to the required 

switch 609 in stage-2. It is assumed here that these 
LD messages arrive at switch 609 in time to be 
combined. It is not necessary for messages to 
arrive simultaneously, in order to be combined. 

40 Switch 609 detects that these messages can be 
combined and therefore combines them. The re- 
suiting single request LD message (<LDA>) is then 
sent to across link 610 to the next stage's switch 
611. No combinable message is detected by 

45 switch 611," therefore it routes the LD message 
across link 612, to the required ME 613. 

The ME 613 executes the LD operation from 
location A in its memory. Since this memory loca- 
tion has the value 5 stored in it, this vaiue is read. 

so The ME 613 generates the reply message (<reply 
5>) and sends it across link 612 to switch 61 1. 

Assuming no errors have taken place in the 
system, the switch 611 checks to see if it had 
combined the request message for this reply mes- 

55 sage. Since it did not it only routes the repiy 
message and sends it across link 610 to switch 
609. Switch 609 recognizes that it had combined 
two messages into one and that the repiy message 
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received is for this resulting combined message. 
Therefore, it decombines the reply message, that is 
it generates the required reply messages for the 
LD messages it had combined and sends these 
two reply messages across links 607 and 608. The 
messages are received by switches 605 and 606 
respectively. Since these switches 605 and 606 did 
not combine the request messages for these reply 
messages, they only route messages across links 
603 and 604 to the required PEs 601 and 602 
respectively. 

It is possible that errors can take place in such 
a system and affect the routing of messages. 
These errors can sometimes result in erroneous 
decombining. For example, in FIG. 6, assume an 
error occurred in the routing logic of switch 611 
when it is routing the reply message. Then it is 
possible that the switch erroneously routes the 
reply message across link 614, instead of link 610. 
(This is illustrated by the dashed lines). The reply 
message is now received by switch 615, instead of 
609. If only the stage number is used to identify if 
a switch needs to decombine the reply message, 
as is the case in the known prior art, then switch 
615 will decombine this message. This is because 
609 combined the message in the Forward path, 
and since this switch is in stage-2, the resulting 
combined request message was tagged as com- 
bined in the second stage of the network. In the 
Reverse path both these switches 609 and 615 are 
also in stage-2, therefore the erroneously routed 
message can be decombined erroneously by 
switch 615. The resulting reply message is routed 
across links 616 and 617 to switches 618 and 619 
respectively.- These switches 618 and 619 then 
route these reply messages across links 620 and 
621 to PEs 622 and 623 respectively. 

Combining networks are MIN networks that are 
logically organized as shown in FIG. 4. They con- 
sist of two MINs, a Forward network 402 and a 
Reverse network 405. The PEs send their request 
messages to the Forward network 402 across the 
network's input links 401. The' Forward network 
combines these request messages as needed and 
routes them to the MEs across its output links 403. 
The MEs execute the operations requested by the 
"request" messages and then generate "reply" 
messages. These replies are then sent to the Re- 
verse network 405 across the network's input links 
404. If request messages were combined by the 
Forward network 402, then the resulting reply mes- 
sages must, of course, be decombined by the 
Reverse network 405. The resulting decombined 
messages, or the single reply message (if not 
decombined), are delivered to the required PEs 
across the Reverse network's 405 output links 406. 

The physical organization of these combining 
networks is dependent on the implementation. By 



this it is meant that it is possible to design the 
Forward and Reverse networks as two separate 
MINs. Alternatively, it may be designed as one 
physical MIN, whose switches support these For- 

s ward and Reverse network operations, while the 
links support bidirectional communication. In either 
implementation, the combining operations need a 
switch in the Forward network to communicate with 
its companion switch in the Reverse network. Simi- 

w larly communication also is needed between the 
Reverse and the Forward networks. This commu- 
nication is done across the inter-network set of 
links 407. This communication between the For- 
ward and Reverse network switches is needed to 

75 communicate necessary decombining information 
between these switches (6, 7 and 8). 

It should be noted that combining networks 
require that the combined request messages and 
their reply messages should be routed via the 

20 same communication path in these Forward and 
Reverse networks. That is the reply message of a 
combined request message should always be rout- 
ed via the switches of the Reverse network, that 
are the companions of the switches in the Forward 

25 network via which the request message was routed. 

The combining network switch logically con- 
sists of a forward switch 504 and a companion 
reverse switch 509, as shown in FIG. 5. Again, for a 
general description of such switching networks and 

30 switches, reference is made to (6, 7 and 8), The 
Forward network switch 504 receives the request 
messages generated by the PEs across its input 
links 501 and 502. The forward switch 504 deter- 
mines when messages are combinable and com- 

35 bines these messages. If a message is combined, 
then the forward switch 504 sends some decom- 
bining information to its companion reverse switch 
509 across link 513. It is possible for the compan- 
ion reverse switch 509 to send some flow control 

40 or other control information to the forward switch 

504 too. This information is exchanged across link 
514. (It should be noted that links like 513 and 514 
make up the inter network set of links 407 in FIG. 
4). Once combining is done, a single request mes- 

45 sage is generated by the forward switch 504. This 
request message is tagged with the forward 
switch's ID 503 to indicate where this message was 
combined. It should be noted that the forward 
switch's ID 503 is the same as the reverse switch's 

so ID 510. The forward switch 504 then routes the 
resulting request message to a switch in the next 
stage, or to a ME if it is in the last switch level 
(closest to memory), across one of its output links 

505 ,or 506. If combining is not done, then the 
* 55 forward switch 504 does not tag the request mes- 
sage. The forward switch 504 only routes the re- 
quest message to a switch in the next stage, or to 
a ME, across one of its output links 505 or 506. 
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The Reverse network switch 509 receives the 
reply messages generated by the MEs across its 
input links 507 and 508. The reverse switch 509 
determines when messages are to be decombined. 
A message is decombined if the messaged ID 
matches the reverse switch's ID 510. If a reply " 
message is decombined, the reverse switch 509- 
generates the required reply messages and sends 
them across the required output links 511 and/or 
512. If a reply message is not decombined, then 
the reverse switch sends the reply message it 
received across the required output link 511 or 
512. 

The physical organization of the combining 
switch depends on the implementation. That is it is 
possible for the forward switch 504 and the reverse 
switch 509 to be designed as separated modules 
(chips or cards), or they can be designed to be one 
module. Further details of the design or implemen- 
tation of these switches is not relevant to this 
invention and would be obvious to one skilled in 
the art (6, 7 and 8). 

The messages that are routed by MINs are 
organized as shown in FIG. 7. Each message can 
be logically partitioned into a . header portion 701 
and an information portion 702. The size of each of 
these portions and* the message itself is not spe- 
cific to the present invention. The use and content 
of such messages are well known in the art. - 

The structure of a message of the combining 
network is shown in FIG. 7. This message's header 
portion is partitioned into the following fields (see 
FIG. 8): 

Routing field (801) : This field stores the in- 
formation needed" by the Forward and Reverse 
networks to route the request and resulting reply 
messages. 

ID field (802): if a message is combined, 
then this field" irTthe request message is initialized 
by the forward switch 504 (see FIG. 5) with its ID 
503. 

Error field (803) : This field is set by the 
forwartTswitch 504, the ME and the reverse switch 
509, whenever they detect an error in either receiv- 
ing or processing the message. 

Other fields (804): This field is used to store 
any other information that may be needed by the 
design of the combining network being used. De- 
tails and the use of this field are not relevant to this 
invention. 

An overview of the operation of the combining 
network's forward switch 504 (see FIG. 5) is shown 
in FIG. 9. It should be noted that only the control 
information relevant to the invention is shown in 
FIG. 9. The request message is received by the 
forward switch 504 in block 901 and is checked. by 
block 902 to determine if it is combinable with 
other messages that may be buffered within the 



switch 504 and/or those that are received on its 
other input links. If the message is combinable 
* (903)* as determined in block 902, then combining 
is done in block 905. If the message is not combin- 
s- able (904) block 902 routes the message via block 
907 to the output links of the switch. If a message 
is combined in block 905. then the message's ID 
field 802 (see FIG. 8) is initialized (block 906) with 
the forward switch 504's ID 503. In order to route a 
70 request message, the required output link across 
which the message must be routed is determined 
by block 907 via the routing field in the message 
header (FIG. 8) and then this output link 909 is 
checked by block 908 to determine if it is free or 
75 not. If this output link is free (909), that is, it is not 
being used to send another message, then this 
message is sent across this output link. If this 
* output link is busy 910 indicates that it is being 
used -to send another message, in which case the 
20 present message is not sent. In this case- the 
switch waits for the next switch cycle (i.e. t a mes- 
sage transmission time) in block 911 and then 
attempts to send this message again via block 908. 
As is well understood by those skilled in the 
25 art, such combining networks must store, when 
appropriate, a great deal of internal, information 
about each combined, message at each switch. 
Thus, each set of messages combined at a switch 
results in a set of data which must be stored 
30 therein. This would normally include at least routing 
information of the packets that are combined so 
that, when the combined message is returned from 
memory, it can be decombined and the parts sent 
down the proper paths to the originating proces- 
35 sors. This and other data is stored at each switch 
whenever combining is done by the switch and 
must be entered, altered or deleted when a new 
message is received from a processor and com- 
bined, or data is returned from memory and must 
40 be decombined. This data at each switch is up- 
dated, etc., in the blocks 905 and 1005 as in- 
dicated by the "update internal information" label 
in each of these blocks. (References (6, 7 and 8) 
de scribe and discuss the use and function of such 
45 internal information in great detail.) 

An overview of the operation of the combining 
network's reverse switch 509 (see FIG. 5) is shown 
in FIG. 10. It should be noted that only the control 
information relevant to the invention is shown in 
so FIG. 10. The ID field 802 (see FIG. 8) of the 
message received in block 1001 is compared in 
block 1002 with the switch 509's ID field 510. If a 
match is found, line 1003 is active and the repiy 
message is decombined and the required reply 
55 messages are generated in block 1005. The result- 
ing reply messages are then routed 1006. If the IDs 
do not match 1004, then the reply message re- 
ceived is checked for errors 1011 and 1012. 
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If errors are detected 1013, then the reply 
message's Error field 803 is set 1015 and then the 
message is routed by block 1006. This Error field 
803 is not set if no errors are detected 1014. In 
order to route a reply message* the required output 
link across which the message must be routed is 
determined in block 1006 and then this output link 
is checked in block 1007 to determine if it is free 
or not. If this output link is free 1008, that is not 
being used to send another message, this mes- 
sage is sent across this output link. If this output 
link is busy 1009, that is being used to send 
another message, then the message is not sent. In 
this case 1009 the switch waits at block 1010 for 
the next switch cycle (i.e. message transmission 
time) and then attempts in block 1007 to send this 
message again. 

The first method of the invention assigns 
unique IDs 1113 to each combining switch. Both 
the forward switch 504 and reverse switch 509 (see 
FIG. 5) IDs 503 and 510 are assigned this unique 
ID 1113. That is the forward 504 and reverse 509 
switches are assigned the same ID. This scheme 
requires that one unique ID "NOC" be reserved to 
indicate that a message is not combined by any 
switch. The PEs set their request message ID 
fields to NOC. The forward switches alter this ID 
(i.e. NOC) only if they combine messages. The 
reverse switches recognize this ID (i.e. NOC) to 
indicate that the reply message is for an uncom- 
bined request message. 

An example of assigning these IDs is shown in 
FIG. 11. For simplicity only switches 1101 to 1112 
are shown in FIG. 11. These switches 1101-1112 
are assigned unique IDs 1 through 12 respectively. 
The NOC ID value chosen for this example is 0. 

The second method envisioned by the present 
invention assigns unique IDs 1215 to each combin- 
ing switch. Both the forward switch 504 and re- 
verse switch 509 (see FIG. 5) IDs 503 and 510 are 
assigned this unique ID 1215. That is the forward 
504 and reverse 509 switches are assigned the 
same ID. This scheme requires that one unique ID 
"NOC" be reserved to indicate that a message is 
not combined by any switch. The PEs set their 
request message ID fields to NOC. The forward 
switches alter this ID (i.e. NOC) only if they com- 
bine messages. The reverse switches recognize 
this ID (i.e. NOC) to indicate that the reply mes- 
sage is for an uncombined request message. 

Each ID assigned by this scheme is construct- 
ed by concatenating two fields 1213 and 1214 (see 
FIG. 12). The first field 1213 is the switch's stage 
number. For example, this field 1213 is set to one 
for switch 1201 and two for switch 1202. The 
second field 1214 is the unique number of the 
switch in its stage. For example, this field 1214 is 
set to one for switch 1201 and two for switch 1204. 



An example of assigning these IDs is shown in 
FIG. 12. For simplicity only the switches 1201 to 
1212 are shown in FIG. 12. These switches 1201- 
1212 are assigned unique IDs 11. 21, 31, 12, 22, 

5 32, 13. 23, 33, 14, 24, and 34 respectively. The 
NOC ID value chosen for this example is 00. 

All of the hardware necessary to identify when 
two messages received at a given switch node are 
susceptible of being combined are well known in 

10 the art and are discussed in both the NYU Ul- 
tracomputer articles (6 and 7) and the RP3 article 
(8) referenced previously. The hardware required 
for placing any given level or switch node ID in a 
message header would be obvious to those skilled 

75 in the art as well as the means for comparing these 
fields with local ID fields at a given switch node 
when messages are being returned. The specific 
hardware and organization utilized to effect these 
obvious functions could take many forms and are 

20 deemed to be obvious to skilled circuit designers. 
To show the hardware details of such circuitry 
would merely serve to obfuscate the present inven- 
tion and has accordingly not been included herein. 
The invention thus resides in the architectural 

25 specification and procedures set forth in the fIGS. 9 
and 10 which are implemented at each switch node 
of a Combining/Decombining network in the For- 
ward and Reverse network switches, respectively. 
Thus by adding a small amount of additional 

30 switch/level ID information in the message header 
in the forward direction and a small amount of 
additional hardware in the reverse network, varying 
the relevant header fields and also performing cer- 
tain error checking operations, a significantly im- 

35 proved interconnection network architecture is 
achieved. 

Also, although the present invention has been 
set forth and described indicating that messages 
can be combined at only one switch node and 

40 level, it is quite conceivable that with more sophisti- 
cated systems: 1 ) more than two messages can be 
combined at a given switch if n-way switches are 
used; and 2) messages could be combined at each 
level of the network assuming the combining cri- 

45 teria were met. Obviously appropriate hardware 
would be necessary to do the testing and combin- 
ing, and adequate headers would have to be pro- 
vided to contain the requisite IDs, error codes, etc. 
Also, appropriate logic and storage would be nec- 

so essary in each switch to retain the necessary rout- 
ing information and all of the data necessary to 
decombine and route the decombined messages to 
their proper source whether to other preceding 
switch nodes or to the originating processors. How- 

55 ever, this is well known in the art, as exemplified 
by the NYU Combining Switch articles (6 and 7), 
and the RP3 Combining Switch reference (8), for 
example. 
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Claims 

1. A method for performing message 
combining/decombining in multistage interconnec- 
tion networks (MINs) which comprises: 

1) assigning a unique identification (ID) num- 
ber to every switch in the network; 

2) analyzing every message to be transmit- 
ted to the system through each switch from a first 
source to determine if it can be combined with 
another message from another source; and if so, 
performing a combining operation, and 

3) appending the unique ID number of the 
switch performing the combining operation tc the 
message header, 

4) passing the modified message to the next 
point, in the network, and repeating steps 1-3 until 
the message reaches its destination in memory; 

5) placing a response message on the re- 
verse path of said interconnection network back to 
a requesting processor and appending any switch 
IDs to. the response message which were contained 
in the. forward message; 

6) analyzing said response message at each 
switch through which said message is routed to 
determine if the message needs to be decombined 
at that switch;,... 

7) said analyzing step including comparing 
the switch's ID field with the decombining ID field 
included in the header; if any, and 

8) if the ID'S match, decombining the mes- . 
sage and forwarding the separate messages to the 
next switch node or to the requesting processors. 

2. A method "for performing message 
combining/decombining in a multistage intercon- 
nection 'network as" set forth in claim' 1 including 
returning the message without decombining, when 
the ID fields do not match, to a single preceding 
stage (or processor) in the MIN. 

3. A method operable in a message 
combining/decombining multistage interconnection 
network system as set forth in claim 1, including 
setting the switch ID field in the message header to 
an initial no-combination configuration (NOC) when 
a message is presented to the MIN, and resetting 
said NOC field to a particular switch's ID only if a 
combination operation takes place; and 
bypassing the decombination function at any 
switch which a message traverses in the reverse 
direction if the ID field which would be associated 
with that switch is set to NOC. 

4. A method operable in a message 
combining/decombining multistage interconnection 
network as set forth in claim 3 wherein the step of 
inserting a unique switch ID field in a combined 
message level includes first concatenating the 
unique switch ID field with a unique ID field iden- 
tifying the level of the switch in the MIN, and 



inserting the concatenated ID in the message head- 
er. 

5. A method operable in a message 
combining/decombining multistage interconnection 

s network as set forth in claim 3, including generating 
and inserting an error detection code field gen- 
erated for the unique switch .ID field in the mes- 
sage header, together with the ID field, and utilizing 
the error detection code field to verify the ID car- 
lo ried by a message traversing the MIN in the re- 
verse direction before performing any decombina- 
tion operation. 

6. In a combining/decombining multistage inter- 
connection network (MIN) for communicating be- 

rs tween the processors of a multiprocessor computer 
system and a memory system comprising a plural- 
ity of separate memory modules, wherein said MIN 
comprises a plurality of individual switch elements 
organized into a plurality of multi-switch levels, a 

20 switch in any level being connected only to switch- 
es in adjacent levels, each said switch including 
means for determining if two messages can be 
combined and, if so, effecting a combining opera- 
tion and storing all internal data necessary to effect 

25 subsequent decombining when the combined mes- 
sage is returned to that switch, the improvement 
which comprises a method operable in the MIN 
switch architecture of each switch comprising: 

1) appending a unique ID to the combined 
30 message's header, when it is "combined", uniquely 

and unambiguously identifying , the switch which 
effected the combining operation; 

2) checking the IDs of any "combined" mes- 
sage returned to the switch to see if the ID of this 

35 message and the switch ID match and, if so; 

3) decombining the message and routing -the 
individual decombined messages in accordance 
with data stored in the switch and if not; 

4) inhibiting any decombining operation and 
40 routing the message to the next switch node in the 

network or to a requesting processor and; 

5) setting an error/field in the message if an 
ID transmission and/or a routing error is detected. 

7. A method operable in a MIN switch architec- 
45 ture as set forth in claim 6 further including appen- 
ding a unique level ID to the message header when 
a combined message is formed together with the 
unique switch ID and comparing both the switch ID 
and the level ID when a "combined" message is 

so received on the return path of the MIN and flagging 
a routing error if the level IDs match but the switch 
IDs don f t. 

8. A method operable in a MIN switch architec- 
ture as set forth in claim 7 including initially setting 

55 a unique "no-combined" field (NOC) in all message 
ID fields and retaining the NOC configuration until a 
combining operation occurs and bypassing any de- 
combination procedure when any message having 
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its ID field set to NOC is returned to any switch. 

9. A method for performing message 
combining/decombining in multistage interconnec- 
tion networks (MINs) which comprises: 

1) assigning a unique identification (ID) num- 5 
ber to every switch in the network; 

2) analyzing every message to be transmit- 
ted to the system through each switch from a first 
source to determine if it can be combined with 
another message from another source; and if so, 10 
performing a combining operation, and 

3) appending the unique ID number of the 
switch performing the combining operation to the 
message header, 

4) passing the so combined message to the ts 
next point in the network, and repeating steps 1-3 

until the message reaches its destination in mem- 
ory; 

5) placing a response message from mem- 
ory on the reverse path of the MIN and appending 20 
ail header information from the outgoing message 

to the response message; 

6) comparing the response message's ID 
with the ID of the receiving switchr 

7) if IDs match' in step 6, decombining the 25 
messages and 

. 8) routing the decombined messages to the 
processors via internal information stored in the 
switch; 

9) if the IDs compared in step do 6 do not 30 
match, checking to see if a message .is routed 
incorrectly, or there is an error in the ID based on 

an error detection code appended to the ID and; 

10) if any error is detected, appending an 
error signal to the non-decombined message's 35 
"error" field before it is sent out to the next swtich 
level or requesting processor. 

10. A method operable in a message 
combining/decombining multistage interconnection 
network system as set forth in claim 9, including 40 
setting the switch ID field in the message header to 

an initial no-combination configuration (NOC) when 
a message is presented to the MIN, and resetting 
said NOC field to a particular switch's ID only if a 
combination operation takes place; and bypassing 45 
the decombination function at any switch which a 
message traverses in the reverse direction if the ID 
field which would be associated with that switch is 
set to NOC. 

11. A method operable in a message so 
combining/decombining multistage interconnection 
network as set forth in claim 1 0 wherein the step of 
inserting a unique ID field in a combined message 
level includes first concatenating a unique switch 

ID field with a unique ID field identifying the level 55 
of the switch in the MIN, and inserting the concat- 
enated ID in the message header. 
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